IS608 Project -- James Quacinella

Goal

The goal of this project is to create cloropleth map of energy consumption on a per-county level. Data from the EIA gives data on utility level demand and what counties utilities deliver power to. We use data, plus Census data on population, to come up with an estimate of per-county energy consumption. This methodology will be explained below.

Data Sources

  • EIA Form 618 - This data folder has many Excel files, two of which are valuable
    • Service Territories maps each County to all the utility IDs that serves it
    • Retail data maps the amount of energy produced by each utlitity
  • Annual Estimates of the Resident Population: April 1, 2010 to July 1, 2013 from U.S. Census Bureau, Population Division

High Level Code Overview

Here is some pseudo-code decribing the whole process, from raw data to final map output

  • Build a mapping from county ID to its 2012 population (2012 being latest data from the EIA)
  • Build a mapping from utility ID to the total energy output of that utility
  • Build a mapping from utility ID to list of county IDs is serves
  • Derive a mapping from utility ID to the sum of county populations of all the counties it serves
  • For each county
    • For each utility that serves this county, calculate an estimate of the county's consumption based of the ratio of the county's population to the total population the utility serves.
  • Output the final county ID to consumption estimate to CSV. This will be served via a webserver to D3

Issues

  • The EIA data maps the Utility to a County by name, which did not eactly match the names of counties in the census data. This required some manual cleanup, which are being handled by the initial load data function.
  • The EIA data does not have more granular data about how much each county consumes of its total energy output. A methodlolgy needed to be developed to approximate this data. I have contacted the EIA for comment, with no response yet.

Code

Globals and Constants


In [93]:
import csv

# Column indicies for EIA data
UTILITY = 1
STATE = 3
COUNTY = 4

# Column indicies for EIA retail data
UTILITY_ID = 1
CONSUMPTION = 21

# Column indicies for Census data
ID = 1
DESC = 2
POP2012 = 7

# Global variables
countyToUtility = {}        # Mapping from county number to a list of utilities serving it
utilityToCounty = {}        # Mapping from utility id to a list of counties it serves
countyToPopulation = {}     # Mapping from county number to a population from census data
nameToID = {}               # Mapping from county name to the county code
utilityToConsumption = {}   # Mapping the utility id to the total consumption in mWh
utilityToPopulation = {}    # Mapping the utility id to the total county population it serves
countyToConsumption = {}    # Mapping the final result of county ID to consumption in mWh


# Constants
YEAR = 2012
PATH_TO_RETAIL_DATA = "data/f8612012/retail_sales_%s.csv" % YEAR  # Path to the defined utility service territories
PATH_TO_SERVICE_DATA = "data/f8612012/service_territory_%s.csv" % YEAR  # Path to the defined utility service territories
PATH_TO_POPULATION_DATA = "data/PEP_2013_PEPANNRES/PEP_2013_PEPANNRES_with_ann_with_changes.csv"
STATES = { 'AK': 'Alaska','AL': 'Alabama','AR': 'Arkansas','AS': 'American Samoa','AZ': 'Arizona','CA': 'California','CO': 'Colorado','CT': 'Connecticut','DC': 'District of Columbia','DE': 'Delaware','FL': 'Florida','GA': 'Georgia','GU': 'Guam','HI': 'Hawaii','IA': 'Iowa','ID': 'Idaho','IL': 'Illinois','IN': 'Indiana','KS': 'Kansas','KY': 'Kentucky','LA': 'Louisiana','MA': 'Massachusetts','MD': 'Maryland','ME': 'Maine','MI': 'Michigan','MN': 'Minnesota','MO': 'Missouri','MP': 'Northern Mariana Islands','MS': 'Mississippi','MT': 'Montana','NA': 'National','NC': 'North Carolina','ND': 'North Dakota','NE': 'Nebraska','NH': 'New Hampshire','NJ': 'New Jersey','NM': 'New Mexico','NV': 'Nevada','NY': 'New York','OH': 'Ohio','OK': 'Oklahoma','OR': 'Oregon','PA': 'Pennsylvania','PR': 'Puerto Rico','RI': 'Rhode Island','SC': 'South Carolina','SD': 'South Dakota','TN': 'Tennessee','TX': 'Texas','UT': 'Utah','VA': 'Virginia','VI': 'Virgin Islands','VT': 'Vermont','WA': 'Washington','WI': 'Wisconsin','WV': 'West Virginia','WY': 'Wyoming'}

Load the Data

This function reads in the census data, and loads a dictionary that maps the county name to its county ID (nameToID), and another dictionary that maps from the county ID to the population estimate in 2012 (countyToPopulation). The county name is constructed as "state_countyname", all lowercase


In [94]:
def loadCensusData():
    ''' Loads the mapping from county number to population into 
    'countyToUPopulation' and a name to ID mapping from the PATH_TO_POPULATION_DATA file. '''
    f = open(PATH_TO_POPULATION_DATA)
    reader = csv.reader(f)
    reader.next()
    reader.next()
    for row in reader:
        # Grab the important parts of the data
        id = int(row[ID])
        desc = row[DESC]
        pop2012 = row[POP2012]

        # Derice the 'county key' from the description column
        (county, state) = desc.split(',')
        county = county.lower().replace("county", "").replace(".", "").replace(" ", "")
        state = state.lower().replace(' ', '')
        key = state + '_' + county

        # correction to Lousiana county names
        if state == "louisiana":
            key = key.replace("parish", "")

        # Setup the two mappings
        nameToID[key] = id
        countyToPopulation[id] = int(pop2012)

This function loads a dictionary, countyToUtility, that maps the county ID to the list of utilities that serve it (and a reverse mapping)


In [95]:
def loadCountytoUtilityData():
    ''' Loads the mapping from county number to utilies into 
    'countyToUtility' from the PATH_TO_SERVICE_DATA file. '''
    f = open(PATH_TO_SERVICE_DATA)
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        state = STATES[ row[STATE].upper() ].lower().replace(' ', '')
        county = row[COUNTY].lower().replace(' ', '').replace('.', '')
        key = state + '_' + county
        utilityID = int(row[UTILITY])

        try:
            if nameToID[key] in countyToUtility:
                countyToUtility[nameToID[key]].add(utilityID)
            else:
                #if key not in nameToID:
                #    print "key %s not found" % key
                countyToUtility[nameToID[key]] = set([utilityID])

            if utilityID in utilityToCounty:
                utilityToCounty[utilityID].add(nameToID[key])
            else:
                utilityToCounty[utilityID] = set([nameToID[key]])
        except Exception as e:
            pass

This function loads the EIA data as a mapping from utility ID to the consumption data


In [96]:
def loadUtilityConsumptionData():
    f = open(PATH_TO_RETAIL_DATA)
    reader = csv.reader(f)
    reader.next()
    reader.next()
    reader.next()
    for row in reader:
        id = int(row[UTILITY_ID])
        consumption = int(row[CONSUMPTION].replace(',', ''))
        utilityToConsumption[id] = consumption

We now define a function to map the utility ID to the sum of the populations of the counties it serves. This will help in the stimation of how much consumption each county was responsible for:


In [ ]:
def deriveUtilityPopulation():
    for utility, counties in utilityToCounty.items():
        totPopulation = 0
        for county in counties:
            totPopulation += countyToPopulation[county]
        utilityToPopulation[utility] = totPopulation

Now, for the main event: lets make our estimate for the consumption estimate of each county:


In [97]:
def calculateCountyConsumption():
    for county, utilities in countyToUtility.items():
        try:
            countyToConsumption[county] = sum([((countyToPopulation[county] / utilityToPopulation[utility]) * utilityToConsumption[utility]) for utility in utilities])
        except KeyError:
            countyToConsumption[county] = 0

Ok, so lets load the data and take a look at what we have:


In [98]:
# Start by loading the census data and the service territory mapping
loadCensusData()
loadCountytoUtilityData()
loadUtilityConsumptionData()
deriveUtilityPopulation()
calculateCountyConsumption()

In [99]:
# Print random sample of county name to list of utility IDs
for county, utilityList in countyToUtility.items()[0:10]:
    print "%s is served by utility IDs %s" % (county, list(utilityList))


15003 is served by utility IDs [19547]
54065 is served by utility IDs [15263]
42009 is served by utility IDs [40222]
30007 is served by utility IDs [23586]
30043 is served by utility IDs [23586]
9003 is served by utility IDs [6207]
41003 is served by utility IDs [40437]
41005 is served by utility IDs [15248]
41007 is served by utility IDs [28541]
13095 is served by utility IDs [18305]

In [100]:
# Print random sample of county name to list of utility IDs
for name, countyID in nameToID.items()[0:10]:
    print "%s (%s) had a population of %s" % (name, countyID, countyToPopulation[countyID])


kansas_elk (20049) had a population of 2674
alabama_wilcox (1131) had a population of 11406
florida_washington (12133) had a population of 24854
newmexico_mckinley (35031) had a population of 72726
southcarolina_cherokee (45021) had a population of 55760
texas_shackelford (48417) had a population of 3368
westvirginia_grant (54023) had a population of 11814
georgia_candler (13043) had a population of 11107
illinois_pope (17151) had a population of 4271
missouri_marion (29127) had a population of 28818

In [101]:
# Print random sample of utility ID to consumption data
for utilityID, consumption in utilityToConsumption.items()[0:10]:
    print "County ID %s consumed %s mWh" % (utilityID, consumption)


County ID 8198 consumed 696574 mWh
County ID 8199 consumed 102237 mWh
County ID 57354 consumed 2060 mWh
County ID 24590 consumed 423362 mWh
County ID 8210 consumed 535614 mWh
County ID 8212 consumed 307219 mWh
County ID 57368 consumed 1349357 mWh
County ID 16416 consumed 73074 mWh
County ID 8226 consumed 145827 mWh
County ID 16420 consumed 102694 mWh

In [102]:


In [102]:


In [102]:


In [102]: